Domain adaptation aims to transfer the knowledge acquired by models trained on (data-rich) source domains to (low-resource) target domains, for which a popular method is invariant representation learning. While they have been studied extensively for classification and regression problems, how they apply to ranking problems, where the data and metrics have a list structure, is not well understood. Theoretically, we establish a domain adaptation generalization bound for ranking under listwise metrics such as MRR and NDCG. The bound suggests an adaptation method via learning list-level domain-invariant feature representations, whose benefits are empirically demonstrated by unsupervised domain adaptation experiments on real-world ranking tasks, including passage reranking. A key message is that for domain adaptation, the representations should be analyzed at the same level at which the metric is computed, as we show that learning invariant representations at the list level is most effective for adaptation on ranking problems.
translated by 谷歌翻译
Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.
translated by 谷歌翻译
Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention
translated by 谷歌翻译
As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially on simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found in our project page: https://cassiepython.github.io/nerfart/.
translated by 谷歌翻译
The identification of addiction-related circuits is critical for explaining addiction processes and developing addiction treatments. And models of functional addiction circuits developed from functional imaging are an effective tool for discovering and verifying addiction circuits. However, analyzing functional imaging data of addiction and detecting functional addiction circuits still have challenges. We have developed a data-driven and end-to-end generative artificial intelligence(AI) framework to address these difficulties. The framework integrates dynamic brain network modeling and novel network architecture networks architecture, including temporal graph Transformer and contrastive learning modules. A complete workflow is formed by our generative AI framework: the functional imaging data, from neurobiological experiments, and computational modeling, to end-to-end neural networks, is transformed into dynamic nicotine addiction-related circuits. It enables the detection of addiction-related brain circuits with dynamic properties and reveals the underlying mechanisms of addiction.
translated by 谷歌翻译
Existing approaches for vision-and-language navigation (VLN) are mainly based on cross-modal reasoning over discrete views. However, this scheme may hamper an agent's spatial and numerical reasoning because of incomplete objects within a single view and duplicate observations across views. A potential solution is mapping discrete views into a unified birds's-eye view, which can aggregate partial and duplicate observations. Existing metric maps could achieve this goal, but they suffer from less expressive semantics (e.g. usually predefined labels) and limited map size, which weakens an agent's language grounding and long-term planning ability. Inspired by the robotics community, we introduce hybrid topo-metric maps into VLN, where a topological map is used for long-term planning and a metric map for short-term reasoning. Beyond mapping with more expressive deep features, we further design a pre-training framework via the hybrid map to learn language-informed map representations, which enhances cross-modal grounding and facilitates the final language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based route for VLN, and the proposed method sets the new state-of-the-art on three VLN benchmarks.
translated by 谷歌翻译
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images. Spatial-domain information has been widely exploited to implement image SR, so a new trend is to involve frequency-domain information in SR tasks. Besides, image SR is typically application-oriented and various computer vision tasks call for image arbitrary magnification. Therefore, in this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network. First, we statistically analyze LR-HR image pairs of several datasets under different scale factors and find that the high-frequency spectra of different images under different scale factors suffer from different degrees of degradation, but the valid low-frequency spectra tend to be retained within a certain distribution range. Then, based on this finding, we devise an adaptive scale-aware feature division mechanism using deep reinforcement learning, which can accurately and adaptively divide the frequency spectrum into the low-frequency part to be retained and the high-frequency one to be recovered. Finally, we design a scale-aware feature recovery module to capture and fuse multi-level features for reconstructing the high-frequency spectrum at arbitrary scale factors. Extensive experiments on public datasets show the superiority of our method compared with state-of-the-art methods.
translated by 谷歌翻译
Vehicle re-identification (Re-ID) is a critical component of the autonomous driving perception system, and research in this area has accelerated in recent years. However, there is yet no perfect solution to the vehicle re-identification issue associated with the car's surround-view camera system. Our analysis identifies two significant issues in the aforementioned scenario: i) It is difficult to identify the same vehicle in many picture frames due to the unique construction of the fisheye camera. ii) The appearance of the same vehicle when seen via the surround vision system's several cameras is rather different. To overcome these issues, we suggest an integrative vehicle Re-ID solution method. On the one hand, we provide a technique for determining the consistency of the tracking box drift with respect to the target. On the other hand, we combine a Re-ID network based on the attention mechanism with spatial limitations to increase performance in situations involving multiple cameras. Finally, our approach combines state-of-the-art accuracy with real-time performance. We will soon make the source code and annotated fisheye dataset available.
translated by 谷歌翻译
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). X-Prompt instructs an LLM with not only NL but also an extensible vocabulary of imaginary words that are introduced to help represent what NL words hardly describe, allowing a prompt to be more descriptive. Like NL prompts, X-Prompt is out-of-distribution (OOD) robust, for which we propose context-guided learning with prompt augmentation to learn its imaginary words for general usability, enabling them to use in different prompt contexts for fine-grain specifications. The promising results of X-Prompt demonstrate its potential of approaching advanced interaction between humans and LLMs to bridge their communication gap.
translated by 谷歌翻译
Network structure evolves with time in the real world, and the discovery of changing communities in dynamic networks is an important research topic that poses challenging tasks. Most existing methods assume that no significant change in the network occurs; namely, the difference between adjacent snapshots is slight. However, great change exists in the real world usually. The great change in the network will result in the community detection algorithms are difficulty obtaining valuable information from the previous snapshot, leading to negative transfer for the next time steps. This paper focuses on dynamic community detection with substantial changes by integrating higher-order knowledge from the previous snapshots to aid the subsequent snapshots. Moreover, to improve search efficiency, a higher-order knowledge transfer strategy is designed to determine first-order and higher-order knowledge by detecting the similarity of the adjacency matrix of snapshots. In this way, our proposal can better keep the advantages of previous community detection results and transfer them to the next task. We conduct the experiments on four real-world networks, including the networks with great or minor changes. Experimental results in the low-similarity datasets demonstrate that higher-order knowledge is more valuable than first-order knowledge when the network changes significantly and keeps the advantage even if handling the high-similarity datasets. Our proposal can also guide other dynamic optimization problems with great changes.
translated by 谷歌翻译